Computer-implemented methods and computer systems configured for generating photorealistic-imitating synthetic representations of subjects

ABSTRACT

In some embodiments, an exemplary inventive computer-implemented method may include steps, performed by a processor, of: obtaining training real representations of a real subject; obtaining a training synthetic representation having a visual effect applied to a synthetic subject; training a first neural network and a second neural network by: presenting the first neural network with training real representation and candidate meta-parameters of latent variables for the visual effect to generate a training photorealistic-imitating synthetic representation of the real subject with the visual effect; presenting the second neural network with the training photorealistic-imitating synthetic representation and the training synthetic representation to determine actual meta-parameters of the latent variables of the visual effect, where the actual meta-parameters are meta-parameters at which the second neural network has identified that the training photorealistic-imitating synthetic representation is realistic, and presenting to the first neural network another real representation and the actual meta-parameters of the latent variables of the visual effect to incorporate the visual effect into another real subject.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims benefit of U.S. provisional patent applicationSer. No. 62/531,607 filed Jul. 12, 2017, which is herein incorporated byreference for all purposes.

FIELD OF THE INVENTION

Generally, the present disclosure is directed to computer-implementedmethods and computer systems configured for generatingphotorealistic-imitating synthetic representations of subjects.

BACKGROUND

Typically, transforming visual appearance of subjects involves receivingan input from a user that identifies a desired type of visualtransformation.

SUMMARY OF THE INVENTION

In some embodiments, the present invention provides for an exemplarycomputer-implemented method that may include at least the followingsteps of: obtaining, by at least one processor, a training real visualinput including a plurality of training real representations of at leastone portion of at least one first real subject; obtaining, by at leastone processor, a training synthetic visual input including at least onetraining synthetic representation having at least one first visualeffect applied to at least one portion of at least one syntheticsubject; training, by the at least one processor, at least one firstneural network and at least one second neural network by: i) presentingthe at least one first neural network with at least one training realrepresentation of the plurality of training real representations and oneor more candidate meta-parameters of one or more latent variables of theat least one first visual effect to incorporate the at least one firstvisual effect into the at least one portion of the at one first realsubject of the at least one training real representation to generate atleast one first training photorealistic-imitating syntheticrepresentation of the at least one portion of the at least one firstreal subject with the at least one first visual effect; (ii) presentingthe at least one second neural network with (1) the at least one firsttraining photorealistic-imitating synthetic representation of the atleast one first portion of the at least one first real subject with theat least one first visual effect and (2) the at least one trainingsynthetic representation having the at least one visual effect appliedto the at least one portion of the at least one synthetic subject todetermine one or more actual meta-parameters of the one or more latentvariables of the at least one first visual effect, where the one or moreactual meta-parameters are meta-parameters at which the at least onesecond neural network has identified that the at least one firsttraining photorealistic-imitating synthetic representation of the atleast one portion of the at least one first real subject with the atleast one first visual effect to be realistic; obtaining, by the atleast one processor, an actual real visual input including at least onesecond real representation of at least one portion of at least onesecond real subject; obtaining, by the at least one processor, secondvisual effect identification data that identifies at least one secondvisual effect to be applied to the actual real visual input; where theat least one second visual effect corresponds to the at least one firstvisual effect; presenting, by the at least one processor, to the atleast one first neural network, the at least one second realrepresentation and the one or more actual meta-parameters of the one ormore latent variables of the at least one first visual effect toincorporate the at least one first visual effect into the at least oneportion of the at least one second real subject of the at least onesecond real representation to generate at least one secondphotorealistic-imitating synthetic representation of the at least oneportion of the at least one second real subject with the at least onefirst visual effect; and causing, by the at least one processor, todisplay the at least one second photorealistic-imitating syntheticrepresentation of the at least one portion of the at least one secondreal subject with the at least one first visual effect on a screen of acomputing device.

In some embodiments, the training real visual input, the actual realvisual input, or both, are part of each respective video stream.

In some embodiments, each respective video stream is a real-time videostream.

In some embodiments, the real-time video stream is a live video stream.

In some embodiments, the at least one first neural network is adeconvolutional neural network.

In some embodiments, the at least one second neural network is aconvolutional neural network.

In some embodiments, at least one of the at least one first neuralnetwork or the at least one second neural network is a TensorFlow neuralnetwork.

In some embodiments, the exemplary method may further include a step ofidentifying, by the at least one processor, the one or more actualmeta-parameters of the one or more latent variables of the at least onefirst visual effect by performing backpropagating calculations throughtime from a binary classification loss.

In some embodiments, the training real visual input, the actual realvisual input, or both, are respectively obtained by a camera componentof a portable electronic device and where the at least one processor isa processor of the portable electronic device.

In some embodiments, the at least one first visual effect includes atleast one of: i) a transformation of user's face into a face of ananimal, ii) a transformation of the user's face into a face of anotheruser, iii) a race transformation, iv) a gender transformation, v) an agetransformation, vi) a transformation into an object which may be closestto the user's appearance, vii) a transformation by swapping one or moreparts of the user's head, viii) a transformation by making one or moredrawings on one of the user's face or the user's head, ix) atransformation by deforming the user's face, x) a transformation byutilizing one or more dynamic mask, or xi) a transformation by changingthe user's appearance based on one or more social characteristic.

In some embodiments, the present invention provides for an exemplarycomputer system that may include at least the following components: acamera component, where the camera component is configured to acquire atleast one of: i) a training real visual input, or ii) an actual realvisual input; at least one processor; a non-transitory computer memory,storing a computer program that, when executed by the at least oneprocessor, causes the at least one processor to: obtain the trainingreal visual input including a plurality of training real representationsof at least one portion of at least one first real subject; obtain atraining synthetic visual input including at least one trainingsynthetic representation having at least one first visual effect appliedto at least one portion of at least one synthetic subject; train atleast one first neural network and at least one second neural networkby: i) presenting the at least one first neural network with at leastone training real representation of the plurality of training realrepresentations and one or more candidate meta-parameters of one or morelatent variables of the at least one first visual effect to incorporatethe at least one first visual effect into the at least one portion ofthe at one first real subject of the at least one training realrepresentation to generate at least one first trainingphotorealistic-imitating synthetic representation of the at least oneportion of the at least one first real subject with the at least onefirst visual effect; ii) presenting the at least one second neuralnetwork with (1) the at least one first trainingphotorealistic-imitating synthetic representation of the at least onefirst portion of the at least one first real subject with the at leastone first visual effect and (2) the at least one training syntheticrepresentation having the at least one visual effect applied to the atleast one portion of the at least one synthetic subject to determine oneor more actual meta-parameters of the one or more latent variables ofthe at least one first visual effect, where the one or more actualmeta-parameters are meta-parameters at which the at least one secondneural network has identified that the at least one first trainingphotorealistic-imitating synthetic representation of the at least oneportion of the at least one first real subject with the at least onefirst visual effect to be realistic; obtain the actual real visual inputincluding at least one second real representation of at least oneportion of at least one second real subject; obtain second visual effectidentification data that identifies at least one second visual effect tobe applied to the actual real visual input; where the at least onesecond visual effect corresponds to the at least one first visualeffect; present, to the at least one first neural network, the at leastone second real representation and the one or more actualmeta-parameters of the one or more latent variables of the at least onefirst visual effect to incorporate the at least one first visual effectinto the at least one portion of the at least one second real subject ofthe at least one second real representation to generate at least onesecond photorealistic-imitating synthetic representation of the at leastone portion of the at least one second real subject with the at leastone first visual effect; and cause to display the at least one secondphotorealistic-imitating synthetic representation of the at least oneportion of the at least one second real subject with the at least onefirst visual effect on a screen of a computing device.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present invention, briefly summarized above anddiscussed in greater detail below, can be understood by reference to theillustrative embodiments of the invention depicted in the appendeddrawings. It is to be noted, however, that the appended drawingsillustrate only typical embodiments of this invention and are thereforenot to be considered limiting of its scope, for the invention may admitto other equally effective embodiments.

FIGS. 1-6D are representative of some exemplary aspects of the presentinvention in accordance with at least some principles of at least someembodiments of the present invention.

To facilitate understanding, identical reference numerals have beenused, where possible, to designate identical elements that are common tothe figures. The figures are not drawn to scale and may be simplifiedfor clarity. It is contemplated that elements and features of oneembodiment may be beneficially incorporated in other embodiments withoutfurther recitation.

DESCRIPTION OF EXEMPLARY EMBODIMENTS

Among those benefits and improvements that have been disclosed, otherobjects and advantages of this invention can become apparent from thefollowing description taken in conjunction with the accompanyingfigures. Detailed embodiments of the present invention are disclosedherein; however, it is to be understood that the disclosed embodimentsare merely illustrative of the invention that may be embodied in variousforms. In addition, each of the examples given in connection with thevarious embodiments of the present invention is intended to beillustrative, and not restrictive.

Throughout the specification, the following terms take the meaningsexplicitly associated herein, unless the context clearly dictatesotherwise. The phrases “in one embodiment” and “in some embodiments” asused herein do not necessarily refer to the same embodiment(s), thoughit may. Furthermore, the phrases “in another embodiment” and “in someother embodiments” as used herein do not necessarily refer to adifferent embodiment, although it may. Thus, as described below, variousembodiments of the invention may be readily combined, without departingfrom the scope or spirit of the invention. Further, when a particularfeature, structure, or characteristic is described in connection with animplementation, it is submitted that it is within the knowledge of oneskilled in the art to effect such feature, structure, or characteristicin connection with other implementations whether or not explicitlydescribed herein.

The term “based on” is not exclusive and allows for being based onadditional factors not described, unless the context clearly dictatesotherwise. In addition, throughout the specification, the meaning of“a,” “an,” and “the” include plural references. The meaning of “in”includes “in” and “on.”

It is understood that at least one aspect/functionality of variousembodiments described herein can be performed in real-time and/ordynamically. As used herein, the term “real-time” is directed to anevent/action that can occur instantaneously or almost instantaneously intime when another event/action has occurred. For example, the “real-timeprocessing,” “real-time computation,” and “real-time execution” allpertain to the performance of a computation during the actual time thatthe related physical process (e.g., a user interacting with anapplication on a mobile device) occurs, in order that results of thecomputation can be used in guiding the physical process.

As used herein, the term “dynamically” means that events and/or actionscan be triggered and/or occur without any human intervention. In someembodiments, events and/or actions in accordance with the presentinvention can be in real-time and/or based on a predeterminedperiodicity of at least one of: nanosecond, several nanoseconds,millisecond, several milliseconds, second, several seconds, minute,several minutes, hourly, several hours, daily, several days, weekly,monthly, etc.

As used herein, the term “runtime” corresponds to any behavior that isdynamically determined during an execution of a software application orat least a portion of software application.

In some embodiments, the inventive specially programmed computingsystems with associated devices are configured to operate in thedistributed network environment, communicating over a suitable datacommunication network (e.g., the Internet, etc.) and utilizing at leastone suitable data communication protocol (e.g., IPX/SPX, X.25, AX.25,AppleTalk™, TCP/IP (e.g., HTTP), etc.). Of note, the embodimentsdescribed herein may, of course, be implemented using any appropriatehardware and/or computing software languages. In this regard, those ofordinary skill in the art are well versed in the type of computerhardware that may be used, the type of computer programming techniquesthat may be used (e.g., object oriented programming), and the type ofcomputer programming languages that may be used (e.g., C++, Objective-C,Swift, Java, Javascript). The aforementioned examples are, of course,illustrative and not restrictive.

As used herein, the terms “image(s)” and “image data” are usedinterchangeably to identify data representative of visual content whichincludes, but not limited to, images encoded in various computer formats(e.g., “.jpg”, “.bmp,” etc.), streaming video based on various protocols(e.g., Real-time Streaming Protocol (RTSP), Real-time Transport Protocol(RTP), Real-time Transport Control Protocol (RTCP), etc.),recorded/generated non-streaming video of various formats (e.g., “.mov,”“.mpg,” “.wmv,” “.avi,” “.flv,” ect.), and real-time visual imageryacquired through a camera application on a mobile device.

The material disclosed herein may be implemented in software or firmwareor a combination of them or as instructions stored on a machine-readablemedium, which may be read and executed by one or more processors. Amachine-readable medium may include any medium and/or mechanism forstoring or transmitting information in a form readable by a machine(e.g., a computing device). For example, a machine-readable medium mayinclude read only memory (ROM); random access memory (RAM); magneticdisk storage media; optical storage media; flash memory devices;electrical, optical, acoustical or other forms of propagated signals(e.g., carrier waves, infrared signals, digital signals, etc.), andothers.

In another form, a non-transitory article, such as a non-transitorycomputer readable medium, may be used with any of the examples mentionedabove or other examples except that it does not include a transitorysignal per se. It does include those elements other than a signal per sethat may hold data temporarily in a “transitory” fashion such as RAM andso forth.

As used herein, the terms “computer engine” and “engine” identify atleast one software component and/or a combination of at least onesoftware component and at least one hardware component which aredesigned/programmed/configured to manage/control other software and/orhardware components (such as the libraries, software development kits(SDKs), objects, etc.).

Examples of hardware elements may include processors, microprocessors,circuits, circuit elements (e.g., transistors, resistors, capacitors,inductors, and so forth), integrated circuits, application specificintegrated circuits (ASIC), programmable logic devices (PLD), digitalsignal processors (DSP), field programmable gate array (FPGA), logicgates, registers, semiconductor device, chips, microchips, chip sets,and so forth. In some embodiments, the one or more processors may beimplemented as a Complex Instruction Set Computer (CISC) or ReducedInstruction Set Computer (RISC) processors; x86 instruction setcompatible processors, multi-core, or any other microprocessor orcentral processing unit (CPU). In various implementations, the one ormore processors may be dual-core processor(s), dual-core mobileprocessor(s), and so forth.

Examples of software may include software components, programs,applications, computer programs, application programs, system programs,machine programs, operating system software, middleware, firmware,software modules, routines, subroutines, functions, methods, procedures,software interfaces, application program interfaces (API), instructionsets, computing code, computer code, code segments, computer codesegments, words, values, symbols, or any combination thereof.Determining whether an embodiment is implemented using hardware elementsand/or software elements may vary in accordance with any number offactors, such as desired computational rate, power levels, heattolerances, processing cycle budget, input data rates, output datarates, memory resources, data bus speeds and other design or performanceconstraints.

One or more aspects of at least one embodiment may be implemented byrepresentative instructions stored on a machine-readable medium whichrepresents various logic within the processor, which when read by amachine causes the machine to fabricate logic to perform the techniquesdescribed herein. Such representations, known as “IP cores” may bestored on a tangible, machine readable medium and supplied to variouscustomers or manufacturing facilities to load into the fabricationmachines that actually make the logic or processor.

As used herein, the term “user” shall have a meaning of at least oneuser.

In some embodiments, as detailed herein, the present invention utilizesthe at least one conditional generative adversarial neural network whichis programmed/configured to perform visual appearance transformations.In some embodiments, as detailed herein, an exemplary inventive computersystem of the present invention may acquire one or more visualrepresentations (e.g., photographs, video, etc.) via, for example, afrontal mobile camera, or a camera attached to the computer or any othersuitable camera. Then, in some embodiments, as detailed herein, theexemplary inventive computer system of the present invention maytransmit the acquired visual data to a remote server for processing, orin other implementations, may process the acquired visual data in thereal-time on a computing device (e.g., mobile device, computer, etc.).During the processing stage, in some embodiments, as detailed herein,the exemplary inventive computer system of the present invention may beprogrammed/configured to encode the image(s) with a given vector ofmeta-parameters/latent variables (responsible for a particular visualeffect), restore original image(s), train the encoding task and generateimage(s) with desirable effect(s). In some embodiments, the resultingimage may be used as a texture in combination with at least onethree-dimensional (3D) model of the user (e.g., face model).

In some embodiments, examples of visual transformation effects may be,without limitation:

-   -   turn user's face into a face of an animal,    -   turn user's face into a face of another user,    -   a race transformation,    -   a gender transformation,    -   an age transformation (making people look younger or older),    -   choose an object which may be closest to the subject's        appearance (based on the machine-learning algorithm's logic),    -   swap parts of subject's head,    -   make drawings on the subject's face or/and head,    -   intentionally deform user's face,    -   utilize dynamic masks (e.g., changing eye-gaze direction,        eyebrow motion, opening mouth, etc.), and    -   change user's appearance based on some predefined logic (e.g.,        changing user's appearance in a such way, as if they were poor,        rich, live in the Stone Age, in future, astronauts, etc.).

In some embodiments, as detailed herein, the exemplary inventivecomputer system of the present invention may be programmed/configured touse the visual transformation effects to transform the real visualrepresentation(s) of users and utilized the transformed visualrepresentation(s) as users' digital/virtual masks in the real-time.

For example, FIG. 1 illustrates an exemplary computer system environmentincorporating certain embodiments of the present invention. As shown inFIG. 1, the inventive environment may include a user (101), who uses,for example, a first computing device (102) (e.g., mobile phone device,laptop, etc.), a server (103) and a second computing device (104) (e.g.,mobile phone device, laptop, etc.). For example, the user (101) mayinteract with the computing device (102) by means of its camera, whichmay take one or series of frames (e.g., still images, video frames,etc.), containing one or more visual representations of the user (e.g.,user's head imagery, user's half-body imagery, user's full-body imagery,etc.).

In some examples, the visual representations of the user may be capturedvia an exemplary camera sensor-type imaging device or the like (e.g., acomplementary metal oxide-semiconductor-type image sensor (CMOS) or acharge-coupled device-type image sensor (CCD)), without the use of ared-green-blue (RGB) depth camera and/or microphone-array to locate whois speaking. In other examples, an RGB-Depth camera and/ormicrophone-array might be used in addition to or in the alternative tothe camera sensor. In some examples, the exemplary imaging device of thecomputing device 102 may be provided via either a peripheral eyetracking camera or as an integrated a peripheral eye tracking camera inbacklight system.

In turn, the computing device (102) sends the acquired visualrepresentations to the server (103) where images may be stored, forexamples, in a database (105) prior to and after processing as detailedherein. In some embodiments, another computer/server (104) may controlthe processing and data storing on the server (103) and database (105).For example, the computer (104) may update at least one model and/oralgorithm that is/are utilized for the processing and data storing onthe server (103) and database (105).

In some embodiments, the input image data (e.g., input video data) mayinclude any appropriate type of source for video contents and maycontain various video sources. In some embodiments, the contents fromthe input video (e.g., the video stream of FIG. 2) may include bothvideo data and metadata. Plurality of frames may be associated with thevideo contents and may be provided to other modules for processing. Asingle picture may also be included in a frame. As shown in FIG. 2, anexemplary input video stream captured by the exemplary camera (e.g., afront camera of a mobile personal smartphone) can be divided intoframes. For example, a typical movie sequence is an interleaved formatof one or more camera shots, and a camera take is a continuous recordedperformance with a given camera setup. Camera registration, as usedherein, may refer to registration of different cameras capturing videoframes in a video sequence/stream. The concept of camera registration isbased on the camera takes in reconstruction of video edits. A typicalvideo sequence is an interleaved format of one or more camera shots, anda camera take is a continuous recorded performance with a given camerasetup. By registering each camera from the incoming video frames, theoriginal interleaved format can be separated into one or more sequenceswith each corresponding to a registered camera that is aligned to theoriginal camera setup.

In some embodiments, the inventive methods and the inventive systems ofthe present inventions can be incorporated, partially or entirely into apersonal computer (PC), laptop computer, ultra-laptop computer, tablet,touch pad, portable computer, handheld computer, palmtop computer,personal digital assistant (PDA), cellular telephone, combinationcellular telephone/PDA, television, smart device (e.g., smart phone,smart tablet or smart television), mobile internet device (MID),messaging device, data communication device, and so forth.

FIG. 3 illustrates an exemplary structure of an exemplary computersystem programmed/configured for changing, for example, the visualrepresentations of the user in accordance with at least some embodimentsof the present invention. In some embodiments, the exemplary inventivecomputer system may be programmed/configured to generate photorealisticimagery with one or more visual effects. During a training phase301-306, in some embodiments, the exemplary inventive computer systemmay be programmed/configured to submit/present photographicrepresentations (e.g., image(s)) (301) and synthetic representations(e.g., image(s)) (302) to a pair of neural networks via an exemplaryinventive Generator module (303) and an exemplary inventiveDiscriminator module (304). In some embodiments, the syntheticrepresentations (e.g., images) may be generated by an exemplarySynthetic Representation Generating module (305). In some embodiments,examples of synthetic representation datasets that the exemplarySynthetic Representation Generating module (305) may be programmed toutilize may be obtained in part or in whole from at least one of: adataset generated using Bleander library (exemplary description isenclosed in Appendix A), a dataset generated using FaceGen library(facegen.com) by Singular Inversions Inc. (Toronto, Calif.), a datasetgenerated using Unity 3D engine (Unity Technologies ApS, San Francisco,Calif.), a user-generated dataset, or any other similarly suitabledataset.

In some embodiments, during the training stage, one or more latentvariables (306) may be passed to the exemplary inventive Generatormodule (303) by the exemplary Synthetic Representation Generating module(305). The examples of the latent variables 306 may be age, gender,race, and similarly others as detailed herein. In some embodiments, theexemplary inventive Generator module (303) may be programmed/configuredto process the real photographic representation(s) (e.g., image(s),video(s), livestream(s), etc.) (301) and inventively produce “fake”photorealistic-imitating synthetic representation(s) (e.g., image(s),video(s), livestream(s), etc.). As used herein, in at least someembodiments, the term “photorealistic-imitating” or its derivative andrelated terms mean representations that appear to a human eye asreproducing something and/or someone that has been captured by a camerafrom the real life (i.e., realistic). “For examples, change(s) in “fake”photorealistic-imitating synthetic representation(s) from the originalphotographic real representation(s) may range from the random fieldadjustment(s) to representation(s) (e.g., image(s)) with one or moredesired visual effects.

In some embodiments, the exemplary inventive Generator module (303)'straining objective may be to increase the error rate of the exemplaryinventive Discriminator module (304) (e.g., producing synthesizedinstances that appear to have come from the true (real) image dataset).For example, the exemplary inventive Generator module (303) may beprogrammed/configured to synthesize new photorealistic-imitatingsynthetic representations (e.g., new images). In some embodiments, theexemplary inventive Generator module (303) “wins” points when it“tricks” the exemplary inventive Discriminator module (304) into adetermination that the new photorealistic-imitating syntheticrepresentations (e.g., new images) are real. In some embodiments, theexemplary inventive Discriminator module (304) may be simultaneouslytaught to discriminate between instances from true/real photographicrepresentations (e.g., real images) and the photorealistic-imitatingsynthetic representations (e.g., images) generated by the exemplaryinventive Generator module (303). In some embodiments, training theexemplary inventive Discriminator (304) may involve presenting theexemplary inventive Discriminator (304) with samples from one or morereal datasets (301) and samples synthesized by the exemplary inventiveGenerator module (303), and performing backpropagating calculationsthrough time from a binary classification loss. For example, in at leastsome embodiments, the inventive system may utilize Keras neural networksAPI (https://github.com/fchollet/keras/tree/master), written in Pythonand capable of running on top of TensorFlow, CNTK, or Theano, to performbackpropagating calculations. Exemplary description of Keras is enclosedherein in Appendix B.

In some embodiments, the exemplary inventive Discriminator module (304)may be simultaneously taught to discriminate between instances from thesynthetic representations (e.g., synthetic representations (302)) andthe photorealistic-imitating synthetic representations (e.g., images)generated by the exemplary inventive Generator module (303).

In some embodiments, the exemplary inventive Discriminator module (304)may be programmed/configured to accept input representation(s) (e.g.,images) and determine whether the input came from the dataset (e.g., 301or 302), or whether it was synthesized by the exemplary inventiveGenerator module (303). In some embodiments, the exemplary inventiveDiscriminator module (304) “wins” points when it detects real datasetvalues correctly, and “loses” points when it approves “fake” values ordenies real dataset values.

In some embodiments, still referring to FIG. 3, during an operationalphase, the component (307) of the exemplary inventive system may not beused so that the input real photographic representations (e.g., realimages) (301) are fed into the exemplary inventive Generator module(303) that would generate new photorealistic-imitating representations(e.g., new images) (308) with the desired visual appearance.

In some embodiments, the exemplary inventive Generator module (303) maybe programmed/configured to be in a form of a deconvolutional neuralnetwork. In some embodiments, the exemplary inventive Discriminatormodule (304) may be programmed/configured to be in a form of aconvolutional neural network. In some embodiments, the exemplaryinventive Generator module (303) may be programmed/configured to be in aform of TensorFlow™ neural network. In some embodiments, the exemplaryinventive Discriminator module (304) may be programmed/configured to bein a form of TensorFlow™ neural network.

In some embodiments, during the operation phase, when the exemplaryinventive Generator module (303) have been trained and the configurationparameters are fixed (e.g., weights of the Generator network aredetermined), the exemplary inventive computer system may beprogrammed/configured to submit visual representations (e.g.,photographs, video, etc.) incorporated with one or more visual effectsto the exemplary inventive Generator module (303) which producesphotorealistic-imitating synthetic representations (307). In someembodiments, the effects may be incorporated in the synthetic dataset bychanging one or more parameters. For example, one or more parameters maybe selected from facial expressions, face anthropometry, race inFaceGen, age in FaceGen, and any combination thereof.

In some embodiments, the exemplary inventive Discriminator networkmodule (304) may be programmed to generate a generalized binary outputmap which matches binary classifiers (0/1) with the desired segmentedparts of the image (e.g., contours around subject image, other similartypes of segmentation).

FIGS. 4A-4C illustrate five (5) parts of an exemplary inventivearchitecture of an exemplary neural network of the exemplary inventiveGenerator module (303).

FIGS. 5A and 5B illustrate exemplary inputs (301) (e.g., real image) anda desired visual effect image (306), respectively. FIG. 5C illustrates anew photorealistic-imitating synthetic representation (308) that theexemplary inventive Generator module (303) may generate based on inputsof FIGS. 5A and 5B.

FIGS. 6A-6D illustrate how the present invention may be utilized withone or more background subtraction techniques such as described, but notlimited to, in U.S. application Ser. No. 15/962,347, each of suchspecific techniques is incorporated herein by reference in its entiretyfor such purpose.

In some embodiments, the exemplary computer engine system may beconfigured such that its members may communicate via one or more radiosmodules capable of transmitting and receiving signals using varioussuitable wireless communications techniques. Such techniques may involvecommunications across one or more wireless networks. Example wirelessnetworks include (but are not limited to) wireless local area networks(WLANs), wireless personal area networks (WPANs), wireless metropolitanarea network (WMANs), cellular networks, and satellite networks. Incommunicating across such networks, one or more radios modules mayoperate in accordance with one or more applicable standards in anyversion.

In various implementations, a final output of the present invention maybe displayed on a screen which may include any television type monitoror display. In various implementations, the display may include, forexample, a computer display screen, touch screen display, video monitor,television-like device, and/or a television. In various implementations,the display may be digital and/or analog. In various implementations,the display may be a holographic display. In various implementations,the display may be a transparent surface that may receive a visualprojection. Such projections may convey various forms of information,images, and/or objects. For example, such projections may be a visualoverlay for a mobile augmented reality (MAR) application.

Further, in some embodiments, the exemplary computer system of thepresent invention may be utilized for various applications which mayinclude, but not limited to, gaming, mobile-device games, video chats,video conferences, live video streaming, video streaming and/oraugmented reality applications, mobile-device messenger applications,and others similarly suitable computer-device applications.

A person skilled in the art would understand that, without violating theprinciples of the present invention detailed herein, in someembodiments, the exemplary illustrative methods and the exemplaryillustrative systems of the present invention can be specificallyconfigured to be utilized in any combination with one or moretechniques, methodologies, and/or systems detailed in U.S. applicationSer. No. 15/881,353, each of such specific disclosures is incorporatedherein by reference in its entirety for such purpose.

In some embodiments, the present invention provides for an exemplarycomputer-implemented method that may include at least the followingsteps of: obtaining, by at least one processor, a training real visualinput including a plurality of training real representations of at leastone portion of at least one first real subject; obtaining, by at leastone processor, a training synthetic visual input including at least onetraining synthetic representation having at least one first visualeffect applied to at least one portion of at least one syntheticsubject; training, by the at least one processor, at least one firstneural network and at least one second neural network by: i) presentingthe at least one first neural network with at least one training realrepresentation of the plurality of training real representations and oneor more candidate meta-parameters of one or more latent variables of theat least one first visual effect to incorporate the at least one firstvisual effect into the at least one portion of the at one first realsubject of the at least one training real representation to generate atleast one first training photorealistic-imitating syntheticrepresentation of the at least one portion of the at least one firstreal subject with the at least one first visual effect; ii) presentingthe at least one second neural network with (1) the at least one firsttraining photorealistic-imitating synthetic representation of the atleast one first portion of the at least one first real subject with theat least one first visual effect and (2) the at least one trainingsynthetic representation having the at least one visual effect appliedto the at least one portion of the at least one synthetic subject todetermine one or more actual meta-parameters of the one or more latentvariables of the at least one first visual effect, where the one or moreactual meta-parameters are meta-parameters at which the at least onesecond neural network has identified that the at least one firsttraining photorealistic-imitating synthetic representation of the atleast one portion of the at least one first real subject with the atleast one first visual effect to be realistic; obtaining, by the atleast one processor, an actual real visual input including at least onesecond real representation of at least one portion of at least onesecond real subject; obtaining, by the at least one processor, secondvisual effect identification data that identifies at least one secondvisual effect to be applied to the actual real visual input; where theat least one second visual effect corresponds to the at least one firstvisual effect; presenting, by the at least one processor, to the atleast one first neural network, the at least one second realrepresentation and the one or more actual meta-parameters of the one ormore latent variables of the at least one first visual effect toincorporate the at least one first visual effect into the at least oneportion of the at least one second real subject of the at least onesecond real representation to generate at least one secondphotorealistic-imitating synthetic representation of the at least oneportion of the at least one second real subject with the at least onefirst visual effect; and causing, by the at least one processor, todisplay the at least one second photorealistic-imitating syntheticrepresentation of the at least one portion of the at least one secondreal subject with the at least one first visual effect on a screen of acomputing device.

In some embodiments, the training real visual input, the actual realvisual input, or both, are part of each respective video stream.

In some embodiments, each respective video stream is a real-time videostream.

In some embodiments, the real-time video stream is a live video stream.

In some embodiments, the at least one first neural network is adeconvolutional neural network. In some embodiments, the at least onesecond neural network is a convolutional neural network.

In some embodiments, at least one of the at least one first neuralnetwork or the at least one second neural network is a TensorFlow neuralnetwork.

In some embodiments, the exemplary method may further include a step ofidentifying, by the at least one processor, the one or more actualmeta-parameters of the one or more latent variables of the at least onefirst visual effect by performing backpropagating calculations throughtime from a binary classification loss.

In some embodiments, the training real visual input, the actual realvisual input, or both, are respectively obtained by a camera componentof a portable electronic device and where the at least one processor isa processor of the portable electronic device.

In some embodiments, the at least one first visual effect includes atleast one of: i) a transformation of user's face into a face of ananimal, ii) a transformation of the user's face into a face of anotheruser, iii) a race transformation, iv) a gender transformation, v) an agetransformation, vi) a transformation into an object which may be closestto the user's appearance, vii) a transformation by swapping one or moreparts of the user's head, viii) a transformation by making one or moredrawings on one of the user's face or the user's head, ix) atransformation by deforming the user's face, x) a transformation byutilizing one or more dynamic mask, or xi) a transformation by changingthe user's appearance based on one or more social characteristic.

In some embodiments, the present invention provides for an exemplarycomputer system that may include at least the following components: acamera component, where the camera component is configured to acquire atleast one of: i) a training real visual input, or ii) an actual realvisual input; at least one processor; a non-transitory computer memory,storing a computer program that, when executed by the at least oneprocessor, causes the at least one processor to: obtain the trainingreal visual input including a plurality of training real representationsof at least one portion of at least one first real subject; obtain atraining synthetic visual input including at least one trainingsynthetic representation having at least one first visual effect appliedto at least one portion of at least one synthetic subject; train atleast one first neural network and at least one second neural networkby: i) presenting the at least one first neural network with at leastone training real representation of the plurality of training realrepresentations and one or more candidate meta-parameters of one or morelatent variables of the at least one first visual effect to incorporatethe at least one first visual effect into the at least one portion ofthe at one first real subject of the at least one training realrepresentation to generate at least one first trainingphotorealistic-imitating synthetic representation of the at least oneportion of the at least one first real subject with the at least onefirst visual effect; ii) presenting the at least one second neuralnetwork with (1) the at least one first trainingphotorealistic-imitating synthetic representation of the at least onefirst portion of the at least one first real subject with the at leastone first visual effect and (2) the at least one training syntheticrepresentation having the at least one visual effect applied to the atleast one portion of the at least one synthetic subject to determine oneor more actual meta-parameters of the one or more latent variables ofthe at least one first visual effect, where the one or more actualmeta-parameters are meta-parameters at which the at least one secondneural network has identified that the at least one first trainingphotorealistic-imitating synthetic representation of the at least oneportion of the at least one first real subject with the at least onefirst visual effect to be realistic; obtain the actual real visual inputincluding at least one second real representation of at least oneportion of at least one second real subject; obtain second visual effectidentification data that identifies at least one second visual effect tobe applied to the actual real visual input; where the at least onesecond visual effect corresponds to the at least one first visualeffect; present, to the at least one first neural network, the at leastone second real representation and the one or more actualmeta-parameters of the one or more latent variables of the at least onefirst visual effect to incorporate the at least one first visual effectinto the at least one portion of the at least one second real subject ofthe at least one second real representation to generate at least onesecond photorealistic-imitating synthetic representation of the at leastone portion of the at least one second real subject with the at leastone first visual effect; and cause to display the at least one secondphotorealistic-imitating synthetic representation of the at least oneportion of the at least one second real subject with the at least onefirst visual effect on a screen of a computing device.

While one or more embodiments of the present invention have beendescribed, it is understood that these embodiments are illustrativeonly, and not restrictive, and that many modifications may becomeapparent to those of ordinary skill in the art, including that variousembodiments of the inventive methodologies, the inventive systems, andthe inventive devices described herein can be utilized in anycombination with each other. Further still, the various steps may becarried out in any desired order (and any desired steps may be addedand/or any desired steps may be eliminated).

What is claimed is:
 1. A computer-implemented method, comprising:obtaining, by at least one processor, a training real visual inputcomprising a plurality of training real representations of at least oneportion of at least one first real subject; obtaining, by at least oneprocessor, a training synthetic visual input comprising at least onetraining synthetic representation having at least one first visualeffect applied to at least one portion of at least one syntheticsubject; wherein the at least one first visual effect comprises at leastone of: i) a transformation of user's face into a face of an animal, ii)a transformation of the user's face into a face of another user, iii) arace transformation, iv) a gender transformation, v) an agetransformation, vi) a transformation into an object which may be closestto the user's appearance, vii) a transformation by swapping one or moreparts of the user's head, viii) a transformation by making one or moredrawings on one of the user's face or the user's head, ix) atransformation by deforming the user's face, x) a transformation byutilizing one or more dynamic mask, or xi) a transformation by changingthe user's appearance based on one or more social characteristic;training, by the at least one processor, at least one first neuralnetwork to incorporate the at least one first visual effect into atleast one portion of the at least one first real subject of at least onetraining real representation of the plurality of training realrepresentations to generate at least one first trainingphotorealistic-imitating synthetic representation of the at least oneportion of the at least one first real subject with the at least onefirst visual effect based on one or more candidate meta-parameters ofone or more latent variables of the at least one first visual effect;training, by the at least one processor, at least one second neuralnetwork to determine one or more actual meta-parameters of the one ormore latent variables of the at least one first visual effect based on:(1) the at least one first training photorealistic-imitating syntheticrepresentation of the at least one first portion of the at least onefirst real subject with the at least one first visual effect and (2) theat least one training synthetic representation having the at least onevisual effect applied to the at least one portion of the at least onesynthetic subject to determine one or more actual meta-parameters of theone or more latent variables of the at least one first visual effect;wherein the one or more actual meta-parameters are meta-parameters atwhich the at least one second neural network has identified that the atleast one first training photorealistic-imitating syntheticrepresentation of the at least one portion of the at least one firstreal subject with the at least one first visual effect to be realistic;obtaining, by the at least one processor, an actual real visual inputcomprising at least one second real representation of at least oneportion of at least one second real subject; obtaining, by the at leastone processor, second visual effect identification data that identifiesat least one second visual effect to be applied to the actual realvisual input; wherein the at least one second visual effect correspondsto the at least one first visual effect; utilizing, by the at least oneprocessor, the at least one first neural network to incorporate the atleast one first visual effect into the at least one portion of the atleast one second real subject of the at least one second realrepresentation to generate at least one second photorealistic-imitatingsynthetic representation of the at least one portion of the at least onesecond real subject with the at least one first visual effect, based onthe at least one second real representation and the one or more actualmeta-parameters of the one or more latent variables of the at least onefirst visual effect; and causing, by the at least one processor, todisplay the at least one second photorealistic-imitating syntheticrepresentation of the at least one portion of the at least one secondreal subject with the at least one first visual effect on a screen of acomputing device.
 2. The method of claim 1, wherein the training realvisual input, the actual real visual input, or both, are part of eachrespective video stream.
 3. The method of claim 2, wherein eachrespective video stream is a real-time video stream.
 4. The method ofclaim 3, wherein the real-time video stream is a live video stream. 5.The method of claim 1, wherein the at least one first neural network isa deconvolutional neural network.
 6. The method of claim 1, wherein theat least one second neural network is a convolutional neural network. 7.The method of claim 1, wherein at least one of the at least one firstneural network or the at least one second neural network is a TensorFlowneural network.
 8. The method of claim 1, further comprising:identifying, by the at least one processor, the one or more actualmeta-parameters of the one or more latent variables of the at least onefirst visual effect by performing backpropagating calculations throughtime from a binary classification loss.
 9. The method of claim 1,wherein the training real visual input, the actual real visual input, orboth, are respectively obtained by a camera component of a portableelectronic device and wherein the at least one processor is a processorof the portable electronic device.
 10. A computer system, comprising: acamera component, wherein the camera component is configured to acquireat least one of: i) a training real visual input, or ii) an actual realvisual input; at least one processor; a non-transitory computer memory,storing a computer program that, when executed by the at least oneprocessor, causes the at least one processor to: obtain the trainingreal visual input comprising a plurality of training realrepresentations of at least one portion of at least one first realsubject; obtain a training synthetic visual input comprising at leastone training synthetic representation having at least one first visualeffect applied to at least one portion of at least one syntheticsubject; wherein the at least one first visual effect comprises at leastone of: i) a transformation of user's face into a face of an animal, ii)a transformation of the user's face into a face of another user, iii) arace transformation, iv) a gender transformation, v) an agetransformation, vi) a transformation into an object which may be closestto the user's appearance, vii) a transformation by swapping one or moreparts of the user's head, viii) a transformation by making one or moredrawings on one of the user's face or the user's head, ix) atransformation by deforming the user's face, x) a transformation byutilizing one or more dynamic mask, or xi) a transformation by changingthe user's appearance based on one or more social characteristic; trainat least one first neural network to incorporate the at least one firstvisual effect into at least one portion of the at least one first realsubject of at least one training real representation of the plurality oftraining real representations to generate at least one first trainingphotorealistic-imitating synthetic representation of the at least oneportion of the at least one first real subject with the at least onefirst visual effect based on one or more candidate meta-parameters ofone or more latent variables of the at least one first visual effect;train at least one second neural network to determine one or more actualmeta-parameters of the one or more latent variables of the at least onefirst visual effect based on: (1) the at least one first trainingphotorealistic-imitating synthetic representation of the at least onefirst portion of the at least one first real subject with the at leastone first visual effect and (2) the at least one training syntheticrepresentation having the at least one visual effect applied to the atleast one portion of the at least one synthetic subject to determine oneor more actual meta-parameters of the one or more latent variables ofthe at least one first visual effect; wherein the one or more actualmeta-parameters are meta-parameters at which the at least one secondneural network has identified that the at least one first trainingphotorealistic-imitating synthetic representation of the at least oneportion of the at least one first real subject with the at least onefirst visual effect to be realistic; obtain the actual real visual inputcomprising at least one second real representation of at least oneportion of at least one second real subject; obtain second visual effectidentification data that identifies at least one second visual effect tobe applied to the actual real visual input; wherein the at least onesecond visual effect corresponds to the at least one first visualeffect; utilize the at least one first neural network to incorporate theat least one first visual effect into the at least one portion of the atleast one second real subject of the at least one second realrepresentation to generate at least one second photorealistic-imitatingsynthetic representation of the at least one portion of the at least onesecond real subject with the at least one first visual effect, based onthe at least one second real representation and the one or more actualmeta-parameters of the one or more latent variables of the at least onefirst visual effect; and cause to display the at least one secondphotorealistic-imitating synthetic representation of the at least oneportion of the at least one second real subject with the at least onefirst visual effect on a screen of a computing device.
 11. The system ofclaim 10, wherein the training real visual input, the actual real visualinput, or both, are part of each respective video stream.
 12. The systemof claim 11, wherein each respective video stream is a real-time videostream.
 13. The system of claim 12, wherein the real-time video streamis a live video stream.
 14. The system of claim 10, wherein the at leastone first neural network is a deconvolutional neural network.
 15. Thesystem of claim 10, wherein the at least one second neural network is aconvolutional neural network.
 16. The system of claim 10, wherein atleast one of the at least one first neural network or the at least onesecond neural network is a TensorFlow neural network.
 17. The system ofclaim 10, wherein the computer program, when executed by the at leastone processor, further causes the at least one processor to identify theone or more actual meta-parameters of the one or more latent variablesof the at least one first visual effect by performing backpropagatingcalculations through time from a binary classification loss.
 18. Thesystem of claim 10, wherein the computing device is a portableelectronic device.